NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Scaling Context-Aware Task Assistants that Learn from Demonstration and Adapt through Mixed-Initiative Dialogue;

Arakawa, Riku; Patidar, Prasoon; Page, Will; Lehman, Jill F; Goel, Mayank (October 2025, ACM)

Full Text Available
PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language Models

https://doi.org/10.1145/3699759

Arakawa, Riku; Lehman, Jill Fain; Goel, Mayank (November 2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Voice assistants capable of answering user queries during various physical tasks have shown promise in guiding users through complex procedures. However, users often find it challenging to articulate their queries precisely, especially when unfamiliar with the specific terminologies required for machine-oriented tasks. We introduce PrISM-Q&A, a novel question-answering (Q&A) interaction termed step-aware Q&A, which enhances the functionality of voice assistants on smartwatches by incorporating Human Activity Recognition (HAR) and providing the system with user context. It continuously monitors user behavior during procedural tasks via audio and motion sensors on the watch and estimates which step the user is performing. When a question is posed, this contextual information is supplied to Large Language Models (LLMs) as part of the context used to generate a response, even in the case of inherently vague questions like What should I do next with this? Our studies confirmed that users preferred the convenience of our approach compared to existing voice assistants. Our real-time assistant represents the first Q&A system that provides contextually situated support during tasks without camera use, paving the way for the ubiquitous, intelligent assistant.
more » « less
Full Text Available
Bring Privacy To The Table: Interactive Negotiation for Privacy Settings of Shared Sensing Devices

https://doi.org/10.1145/3613904.3642897

Zhou, Haozhe; Goel, Mayank; Agarwal, Yuvraj (May 2024, ACM)

To address privacy concerns with the Internet of Things (IoT) devices, researchers have proposed enhancements in data collection transparency and user control. However, managing privacy preferences for shared devices with multiple stakeholders remains challenging. We introduced ThingPoll, a system that helps users negotiate privacy configurations for IoT devices in shared settings. We designed ThingPoll by observing twelve participants verbally negotiating privacy preferences, from which we identified potentially successful and inefficient negotiation patterns. ThingPoll bootstraps a preference model from a custom crowdsourced privacy preferences dataset. During negotiations, ThingPoll strategically scaffolds the process by eliciting users’ privacy preferences, providing helpful contexts, and suggesting feasible configuration options. We evaluated ThingPoll with 30 participants negotiating the privacy settings of 4 devices. Using ThingPoll, participants reached an agreement in 97.5% of scenarios within an average of 3.27 minutes. Participants reported high overall satisfaction of 83.3% with ThingPoll as compared to baseline approaches.
more » « less
Full Text Available
Kirigami: Lightweight Speech Filtering for Privacy-Preserving Activity Recognition using Audio

https://doi.org/10.1145/3643502

Boovaraghavan, Sudershan; Zhou, Haozhe; Goel, Mayank; Agarwal, Yuvraj (March 2024, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

Audio-based human activity recognition (HAR) is very popular because many human activities have unique sound signatures that can be detected using machine learning (ML) approaches. These audio-based ML HAR pipelines often use common featurization techniques, such as extracting various statistical and spectral features by converting time domain signals to the frequency domain (using an FFT) and using them to train ML models. Some of these approaches also claim privacy benefits by preventing the identification of human speech. However, recent deep learning-based automatic speech recognition (ASR) models pose new privacy challenges to these featurization techniques. In this paper, we systematically evaluate various featurization approaches for audio data, assessing their privacy risks through metrics like speech intelligibility (PER and WER) while considering the utility tradeoff in terms of ML-based activity recognition accuracy. Our findings reveal the susceptibility of these approaches to speech content recovery when exposed to recent ASR models, especially under re-tuning or retraining conditions. Notably, fine-tuned ASR models achieved an average Phoneme Error Rate (PER) of 39.99% and Word Error Rate (WER) of 44.43% in speech recognition for these approaches. To overcome these privacy concerns, we propose Kirigami, a lightweight machine learning-based audio speech filter that removes human speech segments reducing the efficacy of ASR models (70.48% PER and 101.40% WER) while also maintaining HAR accuracy (76.0% accuracy). We show that Kirigami can be implemented on common edge microcontrollers with limited computational capabilities and memory, providing a path to deployment on a variety of IoT devices. Finally, we conducted a real-world user study and showed the robustness of Kirigami on a laptop and an ARM Cortex-M4F microcontroller under three different background noises.
more » « less
Full Text Available
VAX: Using Existing Video and Audio-based Activity Recognition Models to Bootstrap Privacy-Sensitive Sensors

https://doi.org/10.1145/3610907

Patidar, Prasoon; Goel, Mayank; Agarwal, Yuvraj (September 2023, Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies)

The use of audio and video modalities for Human Activity Recognition (HAR) is common, given the richness of the data and the availability of pre-trained ML models using a large corpus of labeled training data. However, audio and video sensors also lead to significant consumer privacy concerns. Researchers have thus explored alternate modalities that are less privacy-invasive such as mmWave doppler radars, IMUs, motion sensors. However, the key limitation of these approaches is that most of them do not readily generalize across environments and require significant in-situ training data. Recent work has proposed cross-modality transfer learning approaches to alleviate the lack of trained labeled data with some success. In this paper, we generalize this concept to create a novel system called VAX (Video/Audio to 'X'), where training labels acquired from existing Video/Audio ML models are used to train ML models for a wide range of 'X' privacy-sensitive sensors. Notably, in VAX, once the ML models for the privacy-sensitive sensors are trained, with little to no user involvement, the Audio/Video sensors can be removed altogether to protect the user's privacy better. We built and deployed VAX in ten participants' homes while they performed 17 common activities of daily living. Our evaluation results show that after training, VAX can use its onboard camera and microphone to detect approximately 15 out of 17 activities with an average accuracy of 90%. For these activities that can be detected using a camera and a microphone, VAX trains a per-home model for the privacy-preserving sensors. These models (average accuracy = 84%) require no in-situ user input. In addition, when VAX is augmented with just one labeled instance for the activities not detected by the VAX A/V pipeline (~2 out of 17), it can detect all 17 activities with an average accuracy of 84%. Our results show that VAX is significantly better than a baseline supervised-learning approach of using one labeled instance per activity in each home (average accuracy of 79%) since VAX reduces the user burden of providing activity labels by 8x (~2 labels vs. 17 labels).
more » « less
Full Text Available
uKnit: A Position-Aware Reconfigurable Machine-Knitted Wearable for Gestural Interaction and Passive Sensing using Electrical Impedance Tomography

https://doi.org/10.1145/3544548.3580692

Yu, Tianhong Catherine; Arakawa, Riku; McCann, James; Goel, Mayank (April 2023, Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems)

A scarf is inherently reconfigurable: wearers often use it as a neck wrap, a shawl, a headband, a wristband, and more. We developed uKnit, a scarf-like soft sensor with scarf-like reconfigurability, built with machine knitting and electrical impedance tomography sensing. Soft wearable devices are comfortable and thus attractive for many human-computer interaction scenarios. While prior work has demonstrated various soft wearable capabilities, each capability is device- and location-specific, being incapable of meeting users’ various needs with a single device. In contrast, uKnit explores the possibility of one-soft-wearable-for-all. We describe the fabrication and sensing principles behind uKnit, demonstrate several example applications, and evaluate it with 10-participant user studies and a washability test. uKnit achieves 88.0%/78.2% accuracy for 5-class worn-location detection and 80.4%/75.4% accuracy for 7-class gesture recognition with a per-user/universal model. Moreover, it identifies respiratory rate with an error rate of 1.25 bpm and detects binary sitting postures with an average accuracy of 86.2%.
more » « less
Full Text Available

Search for: All records